{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1f6bd03d-1b8b-45a0-bc2c-5a13f1a5d8d3",
   "metadata": {},
   "source": [
    "# LM317 Voltage Regulator Datasheet Structured Extraction\n",
    "\n",
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_cloud_services/blob/main/examples/extract/lm317_structured_extraction.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>\n",
    "\n",
    "This notebook demonstrates an agentic document workflow using LlamaExtract to process an LM317 voltage regulator datasheet. In this example, we define a structured extraction schema that converts key technical fields into standardized subfields. For instance, the output voltage is split into a minimum and maximum value with a defined unit, and we capture page citations for each extracted field.\n",
    "\n",
    "The target user is an electronics engineer at a component manufacturing company who needs to consolidate datasheet information into a standardized specification sheet for design and quality control.\n",
    "\n",
    "This approach reduces manual data entry, improves extraction accuracy and standardization, and provides traceability for each technical detail."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a3b8c8d5-ff3e-48ce-b0b8-29b6b1f517f8",
   "metadata": {},
   "source": [
    "## Use Case Overview\n",
    "\n",
    "### Problem\n",
    "Datasheets like that for the LM317 regulator are often distributed as PDFs containing multiple tables, charts, and complex textual descriptions. Engineers must manually extract technical details such as voltage ranges, dropout voltage, maximum current, input voltage range, and pin configurations. This process is error-prone and time-consuming.\n",
    "\n",
    "### Agent Workflow (Combination of Automation and Chat)\n",
    "1. **Upload Datasheet:** The engineer uploads the LM317 datasheet PDF.  \n",
    "2. **Structured Extraction:** An automated agent processes the PDF and extracts key technical details into structured fields (e.g., output voltage as a range with separate min/max values).\n",
    "3. **Interactive Verification:** The engineer can query the agent (via chat) for further details or clarification (e.g., \"Show me the detailed pin configuration extraction\") and review the cited pages.\n",
    "\n",
    "**Value Delivered:**\n",
    "- Up to 70% reduction in manual data extraction time.\n",
    "- Increased accuracy and standardization with structured fields."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a704e843-54be-4969-842b-713584cb3c35",
   "metadata": {},
   "source": [
    "## Setup and Download Data\n",
    "\n",
    "Download the [LM317 Datasheet](https://www.ti.com/lit/ds/symlink/lm317.pdf) and setup LlamaExtract."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6e5b1f91-8785-44d4-a710-8be1b48b76de",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p data/lm317_structured_extraction\n",
    "!wget https://www.ti.com/lit/ds/symlink/lm317.pdf -O data/lm317_structured_extraction/lm317.pdf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f17b914a-00ed-4b63-8198-69fd7c4a7c62",
   "metadata": {},
   "outputs": [],
   "source": [
    "from dotenv import load_dotenv\n",
    "from llama_cloud_services import LlamaExtract\n",
    "from llama_cloud.core.api_error import ApiError\n",
    "\n",
    "# Load environment variables (ensure LLAMA_CLOUD_API_KEY is set in your .env file)\n",
    "load_dotenv(override=True)\n",
    "\n",
    "# Initialize the LlamaExtract client\n",
    "llama_extract = LlamaExtract(\n",
    "    project_id=\"<project_id>\",\n",
    "    organization_id=\"<organization_id>\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ed9f6e9a-96c8-4ee1-8b45-0b6a4f7dbbf1",
   "metadata": {},
   "source": [
    "## Defining a Structured Extraction Schema\n",
    "\n",
    "We now define a rich Pydantic schema to extract technical specifications from the LM317 datasheet. In this schema:\n",
    "\n",
    "- The **output_voltage** and **input_voltage** fields are structured as ranges with separate minimum and maximum values and a unit.\n",
    "- The **pin_configuration** field is structured to include a pin count and a descriptive layout.\n",
    "- Additional technical fields (e.g., dropout voltage, max current) are captured as numbers.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4f7e9b44-5e69-4b30-9864-cd98f1e2a7d4",
   "metadata": {},
   "outputs": [],
   "source": [
    "from pydantic import BaseModel, Field\n",
    "from typing import List\n",
    "\n",
    "\n",
    "class VoltageRange(BaseModel):\n",
    "    min_voltage: float = Field(..., description=\"Minimum voltage in volts\")\n",
    "    max_voltage: float = Field(..., description=\"Maximum voltage in volts\")\n",
    "    unit: str = Field(\"V\", description=\"Voltage unit\")\n",
    "\n",
    "\n",
    "class PinConfiguration(BaseModel):\n",
    "    pin_count: int = Field(..., description=\"Number of pins\")\n",
    "    layout: str = Field(..., description=\"Detailed pin layout description\")\n",
    "\n",
    "\n",
    "class LM317Spec(BaseModel):\n",
    "    component_name: str = Field(..., description=\"Name of the component\")\n",
    "    output_voltage: VoltageRange = Field(\n",
    "        ..., description=\"Output voltage range specification\"\n",
    "    )\n",
    "    dropout_voltage: float = Field(..., description=\"Dropout voltage in volts\")\n",
    "    max_current: float = Field(..., description=\"Maximum current rating in amperes\")\n",
    "    input_voltage: VoltageRange = Field(\n",
    "        ..., description=\"Input voltage range specification\"\n",
    "    )\n",
    "    pin_configuration: PinConfiguration = Field(\n",
    "        ..., description=\"Pin configuration details\"\n",
    "    )\n",
    "    features: List[str] = Field([], description=\"List of additional technical features\")\n",
    "\n",
    "\n",
    "class LM317Schema(BaseModel):\n",
    "    specs: List[LM317Spec] = Field(\n",
    "        ..., description=\"List of extracted LM317 technical specifications\"\n",
    "    )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e0508e38-35be-446c-afe7-129e39553281",
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    existing_agent = llama_extract.get_agent(name=\"lm317-datasheet\")\n",
    "    if existing_agent:\n",
    "        llama_extract.delete_agent(existing_agent.id)\n",
    "except ApiError as e:\n",
    "    if e.status_code == 404:\n",
    "        pass\n",
    "    else:\n",
    "        raise"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bb197dfd-dd37-459e-8953-cc1b12f25bdd",
   "metadata": {},
   "source": [
    "Here we use our balanced extraction mode."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e3defc0a-c685-4fbd-bbb1-1270f1442e72",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_cloud import ExtractConfig\n",
    "\n",
    "extract_config = ExtractConfig(\n",
    "    extraction_mode=\"BALANCED\",\n",
    ")\n",
    "\n",
    "agent = llama_extract.create_agent(\n",
    "    name=\"lm317-datasheet\", data_schema=LM317Schema, config=extract_config\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c0a0f9f9-2ef3-4a38-bd74-68d2c2e9e2d8",
   "metadata": {},
   "source": [
    "## Extracting Information from the LM317 Datasheet\n",
    "\n",
    "For this demonstration, please download a publicly available LM317 voltage regulator datasheet (for example, from Texas Instruments) and save it as `lm317.pdf` in the `./data` directory. Then run the cell below to extract the structured technical specifications."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c58e8b7a-8f9b-46f3-8f72-3c2f96b49e8f",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Uploading files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,  1.08s/it]\n",
      "Creating extraction jobs: 100%|████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  1.96it/s]\n",
      "Extracting files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:27<00:00, 87.38s/it]\n"
     ]
    }
   ],
   "source": [
    "# Path to the LM317 datasheet PDF\n",
    "lm317_pdf = \"./data/lm317_structured_extraction/lm317.pdf\"\n",
    "\n",
    "# Extract structured technical specifications from the datasheet\n",
    "lm317_extract = agent.extract(lm317_pdf)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a2e2e44-6c48-4a38-a6de-5f2f3c7d4d8b",
   "metadata": {},
   "source": [
    "## Assessing the Extraction Results\n",
    "\n",
    "The output will be a consolidated list of LM317 technical specifications. For each entry, you should see structured fields including:\n",
    "\n",
    "- **component_name**\n",
    "- **output_voltage** as a range (with separate `min_voltage` and `max_voltage` plus `unit`)\n",
    "- **dropout_voltage** and **max_current** as numbers\n",
    "- **input_voltage** as a structured range\n",
    "- **pin_configuration** with a `pin_count` and `layout`\n",
    "- **features** (if available)\n",
    "\n",
    "This structured approach makes it easier to standardize the information for downstream integration and verification. Engineers can click on the cited page numbers (in a UI that supports it) to validate the extraction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fb2abc44-7c9b-4b19-958e-d0d7b390ae57",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'specs': [{'component_name': 'LM317',\n",
       "   'output_voltage': {'min_voltage': 1.25, 'max_voltage': 37.0, 'unit': 'V'},\n",
       "   'dropout_voltage': 0.0,\n",
       "   'max_current': 1.5,\n",
       "   'input_voltage': {'min_voltage': 4.25, 'max_voltage': 40.0, 'unit': 'V'},\n",
       "   'pin_configuration': {'pin_count': 3,\n",
       "    'layout': '1: ADJUST, 2: OUTPUT, 3: INPUT'},\n",
       "   'features': ['Output voltage range adjustable from 1.25 V to 37 V',\n",
       "    'Output current greater than 1.5 A',\n",
       "    'Internal short-circuit current limiting',\n",
       "    'Thermal overload protection',\n",
       "    'Output safe-area compensation']}]}"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Display the extraction results\n",
    "lm317_extract.data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c7a2a523-095e-40bf-b713-f509c13a7747",
   "metadata": {},
   "source": [
    "You can also see the output result in the UI."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dc22dfa5-b667-4fb0-8dbe-24e401b12389",
   "metadata": {},
   "source": [
    "![](data/lm317_structured_extraction/lm317_extraction.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e0e0c12a-9f89-4bb3-b40d-3e9f7c6d2fef",
   "metadata": {},
   "source": [
    "## Conclusion\n",
    "\n",
    "This notebook demonstrated how to use LlamaExtract with a structured extraction schema for the LM317 voltage regulator datasheet. By defining detailed subfields (such as splitting voltage ranges into minimum and maximum values, and structuring the pin configuration), we ensure that the extracted data is standardized and traceable through page citations. This approach minimizes manual effort and improves accuracy, providing a robust example of an agentic document workflow for technical documentation processing.\n",
    "\n",
    "Feel free to modify or extend the schema to capture additional technical details or to suit your own use cases."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "llama_parse",
   "language": "python",
   "name": "llama_parse"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
